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The strength of perpendicular anisotropy is known to drive the spin re- 
orientation in thin magnetic films. Here we consider the effect different order 
anisotropics have on two phase transitions; the spin reorientation transition 
and the orientational order transition. We find that the relative magnitude 
of different order anisotropics can significantly enhance or suppress the de- 
gree to which the system reorients. Specifically Monte Carlo simulations 
reveal significant changes in the cone angle and planar magnetization. In 
order to facilitate rapid computation we have developed a stream processing 
technique, suitable for use on GPU systems, for computing the transition 
probabilities in two dimensional systems with dipole interactions. 



1 Introduction 



In two dimensions the effect of thermal fluctuations is enhanced. The num- 
ber of possible symmetries is lower than in three dimensional systems and 
this reduced symmetry means that there are fewer degrees of freedom to 
absorb energy [UEIE]. The discovery of a divergence in the susceptibility in 
the XY model [I], caused by topological excitations [H [6], has meant that 
the existence and stability of spontaneous ordered states in two dimensions 
has been a rich and often contentious [ZlElllElIIOlEllIIlIiailHllH^ area 
of theoretical interest. In particular the dimensionality of the field|15t 116]. 
finite size effects [17J and anisotropics [18] can effect the phase diagram. 
Two dimensional systems can be realized experimentally as thin (typically < 
15 atomic layer) magnetic films. By varying the composition and thickness of 
thin films a large variety of magnetic properties have been obtained[19j. It is 
possible to create films which strongly favor either in-plane or perpendicular 
orientation of the magnetization [20]. The functional dependence of energy 
on perpendicular magnetization varies and can be altered using ion beam 
irradiation [2U [22]. In the presence of strong uni-axial anisotropy favoring 
perpendicular alignment, the competition between the entropy favored in- 
plane magnetization and the energetically favored out of plane state can 
lead to a temperature driven spin reorientation transition [23, l24f [25]. The 
nature of this transition is known to be dependent on the relative strengths 
of different orders of anisotropy [261 E2 EE]- In these systems the ratio of 
long range dipole coupling and short range exchange coupling can lead to 
the formation of striped domains of alternating spin direction; either as the 
ground state [29j [30] or as a mechanism of spin reversal [31J. Films have 
been produced in with stripes observed running parallel with a common 
orient at ion [3T| \30\ [32]. as zig-zags between regular defects [33] or forming 
complex patterns with no orientational order [3U[30]. The width and mobil- 
ity of stripes depends on temperature [35]. In particular these can systems 
display strong thermal memory, in which the domain configuration depend 
on the rate of heating or cooling [36j [37] . Both the reorientation and stripe 
melting transitions have been studied analytically [381 GSl SOI [41] and using 
Monte Carlo Simulation [421 [43] . 

Here we propose a technique in which energy differences arising from long 
range dipole coupling are approximated. This approximation allows for com- 
putation to be parallelized significantly; reducing the computational time 
required. Having first examined the extent to which this approximation in- 
fluences the results of simulation, the method is applied to a two dimensional 
Heisenberg model where the effects of higher order anisotropy are examined. 
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2 Theory 



2.1 Dipole Coupling 

Consider a thin ferromagnet, modeled as a two dimensional square lattice 
of Heisenberg spins (s E S 2 ). The spins experience a long range dipole 
interaction 



Where n and m represent vertices of the two dimensional lattice, f nm is the 
vector from n to m and Co is a constant, Cd — (M 2 po) / (Air) . 
In simulations periodic boundary conditions are used to approximate an 
infinite system. The total system consists of a tiling of replicas. For sim- 
ulation size LxL, spins separated by vector G — (aL, bL) with a,f) G Z 
are identical. To compute the infinite sum introduced by periodic boundary 
conditions a new set of coordinates is introduced; r nm = G + Pnm- Here p 
restricted to p — {p Xl p y ), with p Xl p y E [0, L}. The dipole energy at a site n 
can then be written (taking the square lattice to be in the x — y plane), 



2 ^ - n +G -r 

i 

where s% represents s n .a and expressions are summed over repeated Greek 
indexes. In order to achieve efficient computation Y% lim r ^o d a dR-rz — 1 =* a 

can be calculated in advance for all choices of n, m, a and f3. Since the sum 
is slow to converge it can be split into a short range real space term and a 
long range Fourier space term according to the technique described by Harris 
[H] based on the analogous three dimensional case developed by Ewald [15] . 
Letting f(f) = ^ i one has / = h + fs F] 



1 While our results take place in the x — y plane the general results given below contain 
Znm and are suitable for three dimensional systems that are infinite in two directions. 
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and 



where 



f L (f) = l/L 2 J2h L (k,z)exp(i27rk-r) (4) 



h L (k,z) = l e - in '& m [e k \ z -^™\ertc( \- Znm| + krj\ 
k \ 2rj J 

+e -k\z-z nm \ ertc (zh_^A + k1j y 



(5) 



Despite the efficiency gained by pre-calculating the interactions and using 
this rapid summation technique, the calculation of the dipole interaction is 
still computationally intensive. In order to calculate the energy of a single 
spin in the system one must calculate N = I? interactions, when calculating 
the energy of a state of the system interactions are required. For a 
moderate system size L = 64 this equates to 4096 interactions for a single 
spin and over 8 x 10 6 for a single state. 

2.2 Monte Carlo Simulation 

When considering a system at finite temperature observable quantities O 
are calculated as expectation values of the Boltzmann distribution: 

E i O[^]exp(^) 

(O) = z {6) 

In order to approximate the properties of this distribution a subset of pos- 
sible states is selected using a Markov chain Monte Carlo with transition 
function 

. f exp ( ~ {H h k ~ T Hj) ) for (H k - HA > 

Mb ^fa) = { v kBT > 3 (7) 

[ 1 otherwise 

Known as the metropolis algorithm [46J, Eqn. Q does not define a method 
for selecting the prospective new state. There are numerous methods for 
constructing new states and the decision is based largely on the system being 
analyzed. In the case of magnetic systems, the simplest and most common 
choice is single spin flips. For a system of size N one Monte Carlo step (MC 
step) requires N spin flips. Herein lies the computational difficulty, in order 
to complete one MC step in a two dimensional system one must calculate 
the energy of a single spin N = L 2 times. If there is dipole coupling present 
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each energy calculation requires L 2 interactions to be computed. In order to 
compute MC steps more efficiently we present a stream processing algorithm 
in section [3] that reduces this computational load. 

3 GPU Parallel Programming and Simultaneous 
flipping 

In order to perform Monte Carlo simulations at an acceptable speed we 
make use of a graphics processing unit (GPU). Unlike most modern com- 
puting systems that implement a Harvard execution model, GPU computing 
employs a stream processing model. In stream processing a single function 
(kernel) is executed simultaneously on a large number of different inputs 
(the stream). Importantly the execution of each input (thread) is indepen- 
dent and there is no communication between threads [47] . For spin lattice 
models where interactions are limited to nearest or next nearest neighbor 
the problem of implementing a parallel GPU algorithm has been examined 
previously and several algorithms exist to distribute computation over single 
[Ml HH1 [50] or multiple GPUs [51]. In these cases multiple single spin flips 
can be performed simultaneously (provided potential update sites are not 
nearest neighbors). In recent work, Campos et al. [52] have approached the 
problem of long range coupling by parallelizing the long range sum in three 
dimensions. Here we go beyond the work of Campos et al. and focus on 
performing multiple simultaneous MC steps. 

3.1 Algorithm 

Here we describe a method for parallelizing MC simulations in the presence 
of dipole coupling. A pseudo code implementation of the algorithm is given 
in appendix [Aj The algorithm depends on the size of the system L and two 
parameters that will be defined below: / and P. For clarity of exposition, 
figures in this section will use a fixed small system size L = 8 and the pa- 
rameters / = 4 and P = 4 (Z and P need not be equal in general). When 
describing the algorithm we take the 'host' to indicate any computation not 
performed on the GPU. Calculations run on host are implemented in the 
normal serial fashion. We will refer to the GPU card as the 'device' and 
calculations run on device are implemented as parallel operations and have 
access to device memory (VRAM). 

Initially the current state of the system is held in the device memory, either 
from initialization or from the previous iteration of the algorithm. On the 
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host, a site is selected at random, in Fig. [I] this site is denoted by a blue 
circle numbered 1. Additional sites are then selected at fixed multiples of 
I according to i = (a/, bl) + i\ for a, b E [0,L//]. These n = (L/l) 2 values 
of i are the update sites and are represented in Fig. [I] as circles numbered 
2, 3 and 4. Next for each site a new spin value is selected at random as 
the potential new spin values. In Figs. [TJ [3] and [I] these potential new spin 
values are represented as diamonds. The location of the selected sites and 
the potential new spin values are copied to the device memory. 
The device then launches 2n threads to calculate nearest neighbor exchange 
coupling^} The nearest neighbor spins accessed by these threads are indi- 
cated as gray boxes in Fig. [T] Simultaneously the system launches 2Pn 
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Figure 1: An 8x8 sample where in each square represents a spin site. Site 
1 is chosen at random. Update sites 2,3 and 4 are selected at fixed distances 
from 1. In addition 4 alternative spin values are also generated (indicated 
here as diamonds) giving a total of (2L 2 )// 2 = 8 energies that need to be 
calculated. Gray squares indicate nearest neighbor sites used in calculation 
of short range interactions. 



threads to calculate the dipole interactions. Each thread calculates the in- 
teraction between a spin at an update site and spins in a vertical sub-section 
of the total system with width L/P. Fig. [2]presents a subset of these threads 
that calculate the interactions associated with a single update site. When 

2 Two threads per update site, to calculate the current energy and the energy of the 
potential new spins 
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Figure 2: The dipole interactions between update site 1 (blue circle) and 
the rest of the system are parallelized into four threads which compute 
a sub set of the possible interactions shown here as pink shaded squares. 
Simultaneously threads will be executing the calculation of dipole coupling 
for the other seven spin values. 

the above threads are complete all interactions have been computed. For 
each of the n potential update sites the results of both the current spin and 
the potential new spin are passed into a single thread. Each of the n new 
threads calculates any single site energies (anisotropics and applied fields), 
then applies the metropolis algorithm and updates the state accordingly( see 
Fig [3]). In Fig. [I] all the threads executed in one iteration of the algorithm 
for the hypothetical small system are displayed. 

3.2 Approximation 

The algorithm presented here reduces the computation required by simulta- 
neously executing P x (L/l) partial sums of size L/P. However in doing so 
an approximation has been made. In Fig. [5j a single thread will compute 
the dipole interaction between a potential update site (blue circles in the 
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Figure 3: One of the n simultaneous spin flipping threads. The original spin 
value and the alternative spin value for site 1 each require five threads to 
compute all interactions. One thread to calculate the short range interac- 
tions with gray squares in Fig. [I] (represented by a gray arrow) and P — 4 
threads to calculate the dipole interactions with the subsystems shown as 
pink squares in Fig. [2] (represented by pink arrows). The results of the 
ten threads are then fed into a single thread that calculates the flipping 
probability for update site 1. 



n ffl n s 

Figure 4: For each of the four current spins and each potential new spin 
the five threads shown in Fig. [3] are computed the results are fed into 
n — 4 threads that calculate the flipping probability. One of the possible 2 n 
possible results spin updates is shown. 

figure) and a subset of the sample of width L/P (pink). Some subsets will 
contain another potential update site (the interactions of which are being 
processed simultaneously another thread). If two spin updates were com- 
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Figure 5: The interactions calculated by two of simultaneously executed 
threads. On the left site 2 interacts with a section of the system that contains 
site 1. On the right site 1 interacts with a section of the system that contains 
site 2. 



puted without parallelization, there are four possibilities for the new state 
4>i given in Fig. [6j neither spin is flipped (0o), the first spin flips (0i), the 
second spin flips (02) or both spins flip (03). Denote the probability that 
the first spin flips as Pi, the probability that the second spin flips given the 
first spin has flipped as P12 and that the second flips given the first spin is 
not flipped as P\ 2 . Then the probability that system finishes in state 0o, 
is Vo = ->Pi-Pi2, where we have used ->Pi to denote the probability that 
Pi does not occur (-.Pi = (1 - Pi)). Similarly V\ = Pi^Pu, V 2 = ^P\P\2 
and P3 = ->PiPi2- If the two spins don't interact then P12 = P\2 and de- 
pends only on the energies Hq and H2. In this case simultaneously updating 
sites gives the same statistics as the conventional sequential flipping. If the 
spins do interact, as is the case with long range coupling, then the above 
algorithm makes the approximation P12 = Pr(0i — ^ 03) ~ Pt(4>0 02 ) or 
H,] — H\ ^ Hz — Ho. 

In order to approximate the size of this error first denote original spin values 
as si and 52 and the potential new values as Si and §2- Then 0o — (^1, s 2 ), 
01 = (Si, s 2 ) , 02 = (<si, S2) and 03 = (Si, §2). Assuming that spins are not 
nearest neighbors then the error €12 = \H% — Hi — H 2 + Hq\ depends only 
on the dipole energy. The error is 



r 12 
r l2 



(8) 



9 




Figure 6: Possible out comes of two potential spin flips and the associated 
probabilities. 

where Si = s± - Si, 82 = S2 - S2 and V = \Si • 82 ~ 3(5i • fu) (82 ' ^12) |- We 
wish to find the maximal values of V. There are two cases in which local 
maximums occur. The first case, which we shall refer to as the perpendicular 
case, is Si = ±82 = ±(0,0,2) corresponding to V = 2. The second case, 
which we shall refer to as the planar case, is Si = ±#2 = ±2f 12 corresponding 
to V = 4. 

Here we have calculated the maximum error introduced by flipping two 
spins simultaneously. However, due to the periodic boundaries conditions 
described in section |2.1| flipping a spin also flips its image in each replica 
that makes up the infinite system. For the perpendicular case each replica 
introduces an additional error cr = (2Cd)/(\G + ri2| 3 ). For the in-plane 
case the maximum V — 4 requires Si and 82 to be parallel to the vector 
connecting the updated spins, T12 + G, which will not be possible for all G. 
To estimate the error for the in-plane case let 

r } 2 + ^ =(cos(7),sin( 7 ),0) (9) 

then fixing the Si || Si gives V — 1 — 3cos(7) 2 . Since simultaneously flipped 
spins and their replicas will exist in all possible directions we take the root 
mean squared average with respect to 7 giving D«2. Based on this argu- 
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ment we estimate the maximum error in H% — Hi as 



n n 



2C D 



^Total — 



E EE 



^((^ + G <z)2 + (j7 + Gy) 2 )3 



(10) 



n j=—n 



The nipping probability will now be exp 




) . At high T 



the Boltzmann probability is more uniform and the error is reduced. For 
L = 64, I = 32 and C^k B T = 0.1 the error in the nipping probability is 
approximately 0.14%. This error places an upper bound on the error for a 
single spin flip, however, it does not preclude the possibility of large errors 
accumulated over the many millions of spin flips that will be performed 
in simulation. In Appendix [B] we present the results of simulating the two 
dimensional Ising model with strong dipole interactions. We find that, when 
compared with conventional techniques, the algorithm produces a maximum 
error of around 8% on the obtained critical values. 

4 Fourth order Anisotropies in the Heisenberg Model 

In two dimensional magnetic systems the phase diagram depends on two 
ratios. The first is the ratio of perpendicular anisotropy to dipole coupling. 
In two dimensions the dipole energy favors in-plane ordering of spins at 
T = [121 S3] , however a sufficiently strong perpendicular anisotropy can 
create a ground state with perpendicular spins. At finite temperature the 
free energy F — E — TS is minimized and the higher entropy in-plane state 
can become favored and the spin reorientation transition occurs. The other 
ratio determining the possible phases of the system is exchange to dipole 
coupling. For sufficiently strong dipole coupling the system forms stripes 
in the ground state, with stronger dipole coupling favoring thinner stripes. 
The total energy for such a system is given by 



Where Ha% is the single site magnetic anisotropy. In the absence of stripes 
the spin reorientation transition is known to depend on the higher order 
anisotropy terms [23l [2U ES]. We are interested on the effect of varying 
the ratio of second to fourth order anisotropy while keeping the anisotropic 



(hj) 



(11) 
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energy difference between the in-plane and out of plane spins constant. The 
anisotropy is defined as 

H Ai = K((l - a)(si ■ z) 2 + a(si ■ z) 4 ) (12) 

with K < and a > — 1. Here K represents the strength of the anisotropy 
and a determines the ratio of fourth order to second order anisotropics. The- 
oretically the dependence of spin orientation on higher order anisotropy has 
been studied as a function of thickness [53] and temperature O [55]. In 
these cases reorientation of spins is modeled as a competition between com- 
peting anisotropy terms which depend on temperature. In our simulations 
anisotropy is considered constant. 

Previously the case of a — and varying K has been examined by White- 
head et al. [23]. The broad thermal phase evolution, ordered stripes at low 
temperature followed by in-plane magnetization followed by the paramag- 
netic transition, is reasonably well understood [HJ[33]. The behavior near 
to the SRT is not well understood, experiments performed on Pt/Co(0.5 
nm) /Pt films by Bergeard et al. [56] have indicated long time scale dynam- 
ics consisting of a regions fluctuating between perpendicular stripes and in 
plane magnetic order. The authors note that near to the SRT, quadratic 
coupling alone is not sufficient to account for this mixed behavior. Using 
AC susceptibility studies of striped phases in Fe/Ni films by Abu-Libdeh et 
al. [36j E3 EI] , have also indicated the presence of long time scale dynamics. 
By varying the order of anisotropy we wish to investigate the nature of the 
phase transition between the striped and in plane phases. 
Here increasing the value of a suppresses states with canting. We consider 
three choices of parameter a; a = — 1 corresponding to a system that favors 
canting (F), a — 2 corresponding to a state with suppressed canting (S) and 
a — corresponding to an intermediate propensity for canting (I). 
We understand this as follows. Consider the restoring force due to anisotropy 
experienced by a spin slightly canted away from perpendicular alignment. 
For the intermediate case the restoring force is given by the derivative of 
energy with respect to zenith angle —dQ i HA i \o i =Q — —2K (with the same 
results for 6i = 7r), so for a small amount of canting away from the per- 
pendicular alignment the change in energy is Aii^. (6^) = 2K0i. For the 
case of canting suppression we have — ^.i7^J^=o — and so there is no 
force experienced for small spin canting. For the case of strong suppression 
—do^Ailoi^ — —QK and the restoring force is three times stronger than 
the equivalent quadratic anisotropy. 
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4.1 Results 



Normalizing against the strength of the dipole coupling to give dimen- 
sionless parameters we define T — {ksT) /Cd, J — J/Cd and K — K/Cd- 
The exchange coupling was fixed at J — 8.9 giving a ground state with 
stripe width w = 8. The anisotropy strength was fixed at K — 15. 
The system is initialized in a perpendicular striped state and then an en- 
semble is generated using the above parallel Monte Carlo algorithm with 
parameters L — 64, P — 64 and / = 32. While the energy of system con- 
verges rapidly (typically taking several thousand MC steps), the morpholog- 
ical properties of the system can take longer to emerge. Slow relaxation of 
stripe patterns has been observed in studies of Ising systems [58] . We find 
that after an equilibrium time of 10 5 Monte Carlo steps the order parameters 
we measure no longer have a time dependence. Once at equilibrium an ad- 
ditional 5 x 10 4 steps are simulated with the state recorded every 50 steps to 
form an ensemble. This ensures that the correlation between the same spin 
in subsequent states of the ensemble is limited. Waiting 50 steps corresponds 
to an average value of the time correlation (si(t = t ) • Si(t = t Q + 50)) « 0.5 
at T = 4. The ensemble size n — 1000 ensures that the standard error in 
the thermal averages of order parameter O 

SE(0) =(m_Biy> (13) 

remains smaller than the errors due to the algorithm discussed in section [3j 
Example states for each value of a are given in figures [7| [8] and [9j 
For the case of favored spin canting, a — —1, (Fig. [7| the system has 
stable stripes in the ground state, but the perpendicular magnetization is 
not saturated. The in-plane components display long range ordering. As 
temperature is increased the stripes display roughening, and then bridging 
leading to the eventual loss of orientational order. As temperature is further 
increased the in-plane order breaks into domains, before the system enters 
the high temperature paramagnetic phase. 

For the intermediate case, a — (Fig. [8]) we observe perpendicular stripes 
at low T. The boundaries of these stripes undergo roughening and eventual 
bridging, analogous to the canting favored case, before orientational order 
is destroyed with increasing temperature. Unlike the canting favored case 
we note the presence of increased canting at the domain boundaries. Above 
this temperature we observe a mixed phase in which perpendicular domains 
are interspersed with regions of in-plane magnetic order. As temperature is 
further increased these domains become increasingly granular until system 
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Figure 7: Example spin configurations for canting favored system(a = — 1). 
Each state is represented by three images, from left to right, showing the 
z,x and y components of spins respectively. Left column from top T — 
0.25, 1., 1.5, 2.0. Right column from top T = 2.5, 3.5, 6, 10. 

reaches the paramagnetic limit. 

When canting is suppressed, a = 2, (Fig. [9]) the system forms perpendic- 
ular stripes in the ground state. As the temperature is increased the walls 
undergo roughening but not the bridging and gradual loss of orientational 
order displayed in the a = and a = — 1 simulations. Instead the system 
undergoes a sudden transition into a state with only small regions of per- 
pendicular alignment remaining and strong in-plane order. As temperature 
is increased we observe an increasing number perpendicular regions as the 
in-plane order breaks into domains. At high temperature in-plane order is 
destroyed as the system becomes paramagnetic. 

In order to locate domain walls we define horizontal and vertical order pa- 
rameters based on those described by Whitehead et al. [131 EH] 

< = E* 

i 

< = E* 

i 

with analogous definitions for the x and y components of the spins. At low 
temperatures when the systems are dominated by perpendicular alignment 
the density of perpendicular domain boundaries is given by {n% + n^)/(4iV). 
In Fig[l0]we see that for the intermediate and canting favored states the wall 



sgn(sfsf + ^) 



(14) 



14 



Z X Y Z X Y 




Figure 8: Example spin configurations for the intermediate case (a = 0). 
Left column from top T — 0.5,2,2.5,3. Right column from top T — 
3.5,4.5,8,10. 



density increases gradually with temperature. This represents roughening 
and then the bridging before perpendicular order is lost and the wall density 
tends to the T —> oc value of 1/2. In contrast the canting suppressed state 
undergoes far less roughening and the wall density remains small until the 
system undergoes a sudden change into the high T state. 



4.1.1 Orientational Order Transition 



The orientational order is given by considering the ratio of horizontal and 
vertical domain boundaries. 



On 



nr. 



(15) 



for a E {x,y,z}. In Eq. 14 we have defined separate order parame- 



ters for each component rather than defining combined parameters n v = 
~ Si$i+y an d n h — Yli 1 — Si$i+x- This definition is robust against 
canting, meaning reduction in O due to changing stripe morphology (such 
as coarsening or bending) can be distinguished from changes in the cone 
angle. Consider the low T states shown in Fig. [7] and [9j both states have 
stripes with no coarsening or bending and the definition given in Eq. [14 



assigns the same order parameters to both states. In Fig. [TT] this O z is 
plotted for the three values of a. O x and O y were also calculated and were 
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Figure 9: Example spin configurations for the canting suppressed case (a = 
2). Left column from top T — 0.5,1.5,2.5,3.25. Right column from top 
T = 3.5,4,6,10. 



zero in all cases. For higher a the stripes are stabilized at higher T. For 
perpendicular stripes, high a values suppress small fluctuations at the stripe 
boundaries preventing the roughening and eventual bridging that leads to a 
loss of orientational order. 



4.1.2 Spin Reorientation Transition 

When the spins are not entirely perpendicular to the plane it is possible 
for the system to acquire in-plane ferromagnetic order. If we define M x — 
1/N J2i s i an d My = 1/N J2i 5 f then the parallel magnetization is given by 

M|| = ((M X 2 + M2)^). (16) 

the appearance of non-zero ferromagnetic order is not inherently indicative 
of global spin reorientation. Canted spins states and finite thickness domain 
walls can account for significant magnetic order |43j 160] . To measure the 
degree to which the spins reorient as a function of temperature we introduce 
the cone angle 77 

r ^#FlMf) (17) 

i 

where Q{ is the zenith angle of spin 0{ = arccos(sf) and we have normal- 



ized 77 so that < 77 < 1. In Fig. 12 77 is shown as a function of T, along 
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nh z +n v z 




2 3 4 5 6 7 
k B T C D - 1 

Figure 10: Total wall length for the canting favored a 
a = (I) and canting suppressed a = 2 (S) cases. 
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— 1 (F), intermediate 



with the T — ^ oo value 77 = x /(2/7r) 2 (l/4(7r 2 - 8)). Despite the energy dif- 
ference between parallel and perpendicular alignments remaining constant, 
the behavior varies dramatically with a. 

For the canting favored case, a = — 1, we see that by reducing the energy 
cost of canting spins, the sample doesn't experience a spin reorientation 
transition, instead it remains canted for all temperatures. In Fig. 13 it is 
seen that this canting allows the sample to have non zero magnetic order 
at T — 0. The appearance of maximum magnetic ordering at T — is 
characteristic of weaker values of tC < 13 and a = [35] . However in these 
cases, the low T parallel magnetization does not occur simultaneously with 
orientational order. As T is increased the degree of canting remains prac- 
tically unchanged with the orientational order decreasing over the range of 

1 < r < 2.5. 
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Orientational Order 
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Figure 11: The orientational order for favored canting a - 
diate canting a = (I) and suppressed canting a = 2 (S). 



-1 (F), interme- 



For the intermediate case, a — 0, we observe results consistent with those 
of Whitehead et al.[43j: a loss of orientational order over a small region, 
1 < T < 3, in which stripes also become slightly canted leading to a small 
in-plane magnetization. Above this temperature the in-plane magnetiza- 
tion increases as 77 decreases, resulting in a peak in-plane magnetization of 
Mm ~ .46 at T = 4.5. For T > 4.5 the a = system is identical to the 
a = — 1 system, 77 slowly decreasing and magnetic order gradually reduced 
to zero at T = 8. 

For the spin suppressed state, a = 2, we observe the same perpendicular 
ground state as in the a = case. Unlike the previous cases the orienta- 
tional order is stabilized up to T — 4, at which point the system undergoes 
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Cone Angle 




1 23456789 10 
k B T C D " 1 

Figure 12: 77 as a function of T for favored canting a = — 1 (F), intermediate 
canting a = (I) and suppressed canting a = 2 (S). 77 = 1 indicates all spins 
point perpendicular to the plane (si.z = ±1), 77 = indicates all spins lie 
in-plane (si.z = 0). The red dashed line represents the T —> 00 value. The 
solid blue line represents (s{.z — l/y/2). 



a sharp transition to a parallel ferromagnetic state. In Fig. 12 we see that 77 
simultaneously undergoes a sharp transition, representing the spin reorien- 
tation transition. Above T = 4 we observe a gradual reduction in magnetic 
order until the system enters the paramagnetic state above T — 8. 
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Figure 13: My as a function of T for favored canting a = — 1 (F), interme- 
diate canting a = (I) and suppressed canting a = 2 (S). 



5 Fluctuations 

In order to examine fluctuations near critical points we calculate the auto- 
correlation, 

a 2 (X) = ((X-(X)f), (18) 

of the three order parameters My, O z and 77, which we denote crjj, Gq and 
respectively. The small numerical values of these variances means that they 
are affected to a greater extent by errors introduced by the simultaneous 
flipping. 

In Fig. 14 we observe peaks in ai associated with the loss of in-plane 
magnetization for each choice of a. In addition we observe a smaller peak 
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at around T — 3 for the intermediate, a — 0, case corresponding to the 
formation of in-plane magnetic order. This low temperature peak is absent 
in the canting suppressed case due to the first order nature of the phase 
transition. The sharp in-plane transition does not correspond to a significant 
change in ferromagnetic ordering. In both the perpendicular striped phase 
and in-plane ferromagnetic state the exchange energy is minimized for the 
majority of spins. In the canting favored state the low T transition is absent 
since maximum magnetic order occurs at the lowest temperature simulated 
T = 0.2. 




1 23456789 10 
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Figure 14: a\\ as a function of T for favored canting a 
canting a = (I) and suppressed canting a = 2 (S). 
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case the 
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is plotted for the three choices of a. For the canted favored 



fluctuations are small, culminating in shallow peak at T — 5. This 
is consistent with the nearly homogeneous cone angle. For the intermediate 
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case the fluctuations displays a broad peak centered just below the minimum 
cone angle at T — 4. For the canting suppressed case there is a small peak 
corresponding to the initial reorientation of the spins followed by a broad 
peak as the cone angle starts to approach the high temperature average. 



In Fig. [16]the fluctuations of O z are plotted as a function of temperature. In 
each case the variance forms a peak corresponding to the loss of orienational 
order. Increased canting suppression corresponds to thinner peaks, as the 
transition occurs over a smaller temperature range. We note also that the 
strength of the peak is smaller for the canting suppressed case, due to the 
fact that only roughening occurs but not bridging. 




1 23456789 10 
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Figure 15: (j fj as a function of Tfor favored canting a = — 1 (F), intermediate 
canting a = (I) and suppressed canting a = 2 (S). 
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Figure 16: gq as a function of T for favored canting a = — 1 (F), intermediate 
canting a = (I) and suppressed canting a = 2 (S). 



6 Conclusions and Comments 

Here we have examined the nature of stripe formation and spin reorienta- 
tion in the presence of strong perpendicular anisotropy. In order to do so 
we have proposed an approximation that allows for a parallel algorithm for 
performing Monte Carlo simulations in cases where there is long range cou- 
pling. We have argued that, since the dipole coupling contributes with a 
r -3 dependence, for appropriate choice of algorithm parameters the approx- 
imation is acceptable. This algorithm reduces significantly the computation 
time associated with increased system size. The algorithm has been applied 
to the case of Ising spins in the presence of dipole coupling and shown to be 
consistent with results obtained by conventional methods. 
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It is possible that further increases in computational efficiency might be 
gained through more efficient parallelization of the dipole sum. Further- 
more the concept of simultaneous flipping might be particularly useful in 
three dimensions where the computational issues are enhanced, the caveat 
of course being that care must be taken with parameter choice since the 
number of simultaneous spins is greatly increased. 

This technique has then been used to examine the effects of higher order 
anisotropy in striped systems where we have shown that the anisotropy 
order can suppress or enhance the SRT in cases with strong out of plane 
anisotropy. 

When comparing our simulations to analytic results we note several dis- 
crepancies. Although we were able to observe the coexistence of stripes and 
magnetic order, we did not observe domain formation of the cone angle as 
suggested by Abanov [¥T] . We also did not observe any magnetic order due 
to correlations in the in-plane magnetization of domain walls in the wall 
segments dividing stripes. Whitehead et al. also noted the absence of such 
correlations [43J, however, Yafet and Gyorgy have argued that correlations 
between domain walls leading to ferromagnetic order are possible [60J. The 
discrepancy between analytic and computational results might largely be a 
result of limited system size. In particular it is not possible to make a fine 
grained examination of wall profiles in these small systems. The parallel 
algorithm presented here reduces the computational cost of scaling system 
size. In the future it would be interesting to use the algorithm to simulate 
far bigger systems, in which domain wall structure could be more reason- 
ably simulated, providing more insight into correlations that occur on finer 
scales. 
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A Pseudo Code 

In order to implement the parallel algorithm described requires addressing 
the potential update sites and the subset of sites with which they inter- 
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act as a function of the thread identifier. Here we give the pseudo code 
showing this addressing for the subsystems described above. The state is 
initialized on the host memory and consists of two one dimensional arrays 
of size I? . These arrays store the azimuthal (0) and zenith {&) angles. In 
what follows we will use Sj to represent values of both (p and 9 at some 
position j in these arrays. A number of potential updates sites are selected 
separated horizontally and vertically by a fixed number of sites which we 
denote /, i.e j = (al,bl) + jl . The interactions between array elements 
is calculated according to the algorithm below in which blockldx.x, block- 
Dim, x and threadldx.x are system integers that identify a thread address. 
In the following a\b is defined as a\b = floor(a/6). For example 7\3 = 2 and 
2 x (7\3) + 7mod3 = 7. 
ON HOST 

51 = Random Integer E [1,/] 

52 = Random Integer E [1,/] 
N = L\l 

for j = 1 ; j < N do 

Select a series of new states: 

9[j] = Random Real e [0, 2tt] 

cj)[j] = Arccos(Random Real E [—1,1]) 

Copy all variables arrays to GPU 
end for 

ON GPU: Exchange Interaction 

Tid = blockIdx.x*blockDim.x + threadldx.x; 

if T id < L 2 \l 2 then 
S x = S 1 + l{T id mo&N) 
S y = (S 2 + l(T id \N)) 

j — SyL + S X 

calculate the exchange coupling between sj and its immediate 
neighbors 

calculate the exchange coupling between (#[7], <j)[j}) and sj's im- 
mediate neighbors 
end if 

ON GPU: Dipole Interaction 

— blockIdx.x*blockDim.x + threadldx.x 

Sid = T id \P 

if S id < L 2 \l 2 then 

S x = Si + l(S id modN) 

S y = (S 2 + l(S id \N)) 
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j — SyL + S X 

a = (L + 1)\P 

p=((L + l)modP) - 1 

for k = -L\2 ; k < L\2 do 

for i = -L\2 + aT id modP) ; i < -L\2 + a(T id modP + 1) do 
j> = (((S y + k)modL)L) + (S x + i)modL 
Calculate the dipole coupling between Sj and Sji 
Calculate the dipole coupling between (0[j], 4>[j]) and sj> 
end for 
end for 
end if 

ON GPU: Single Site Energies and Spin Flips 
Tid = blockIdx.x*blockDim.x + threadldx.x; 

if T id < L 2 \l 2 then 
S x = 5i + J(r id modiV) 
S y = (S 2 + l(T id \N)) 

j — SyL + S X 

calculate anisotropy and Zeeman energies for sj 
calculate anisotropy and Zeeman energies 

replace sj with (0[j], cf)[j]) or sj according to the Boltzmann 
probability 
end if 

B Dipole Ising Model 

In section [3?2] it was argued that the error introduced by a single pass of the 
proposed GPU algorithm is bounded. However this does not ensure that the 
error is not compounded over a large number of passes leading to large sys- 
tematic errors. In order to investigate this possibility we consider the result 
of applying the algorithm when every accepted spin update corresponds to a 
complete reversal: Si = (0, 0, sf) — (0, 0, ±1). We select the same algorithm 
parameters that are used in section |4j L = 64 and / = 32. This corresponds 
to four simultaneous attempted spin flips for each cycle of the algorithm. 
Since each site is restricted to only two states the energy can be written 



In the presence of a sufficiently strong dipole interaction the ground state 
of the system will form a striped pattern of alternating spins. In order 




(19) 
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to ensure that dipole coupling contributes significantly to the total energy 
(increasing the error) parameters are selected such that J/Cp = 1.7, which 
corresponds to the thinnest possible stable stripe width h = 1. The order 
parameter for such a system is formed by considering a series of sub-lattices 
in the manner described by Binder and Landau |61j . For h = 1 the system 
is broken into four sub-lattices m\, horizontal stripes are described by m^ = 
mi + 77i2 — 777,3 — 777,4 and vertical stripes by m v — mi + 7774 — 7773 — 7772. 
The order parameter is then the staggered magnetization 777 s t = {{rn\ + 
ml) 1 / 2 ). In Fig. 17 this staggered magnetization is shown as a function 
of a normalized temperature T — ksTC^ 1 . The staggered susceptibility 




feTCr 



Figure 17: The staggered magnetization as a function of T showing the 
order disorder transition. 



% 100 




Figure 18: The staggered Susceptibility as a function of T. 



Xst = ^^(( m st) — ( m st) 2 ) is also calculated. By determining the location 
of the peak in Fig. 18 the critical temperature of the phase transition % 
can be determined, the result is shown in Table I. The peak lacks the 6- 
like structure associated with a first order transition, instead displaying an 
exponential decay consistent with a continuous phase transition [E 
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this is consistent with previous simulations with strong dipole coupling 
In Fig. 19 we show the probability distribution of the average energy per 
spin for various temperatures near the transition. At each temperature the 
distribution displays a single turning point, this is also consistent with the 
expected continuous transition [64J. 

Near to the critical temperature the system displays critical behavior. After 
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Figure 19: Probability distributions of energy near the critical point: T 
.833 Blue Circles; T = .85 Red Squares; T = .867 Yellow Diamonds; T 
.884 Green Triangles. 



rescaling temperature as t = L» |1 — T/T c \ one expects the following scaling 

relationships: m s t = L ^mo(t) and Xst — ^Xo(^) 5 where mo(t) and Xo(t) 
are universal scaling functions [65, 66J. Close to the critical point, when 
|1 — T/T c \ is small and L large, these universal scaling functions are expected 
to show a power law behavior mo(t) Bt@ and xo(t) ~~ ^ At -1 . In order to 
extract parameters from simulation one usually makes use of the universality 
of mo(t) and xo(t) and simulates the system for a number of different sizes 
before varying the parameters until the results of all simulations lie on a 
single curve [67, 66J. However, performing such analysis would not be a 
suitable test of the algorithm, for any choice L < I there is no simultaneous 
flipping and hence no approximation is being made. If one were to change 
the size of / to simulate smaller systems the degree of approximation would 
be changed (the error will be increased for decreasing /). Instead we use 
the value of the scaling constant v calculated by Rastelli et al. to rescale 
temperature and then use the power law dependence of the system near 
T c to extract f3 and 7. A comparison of the critical properties is given 
in Table I. The use of the GPU has introduced some error in the critical 
parameters of the system, the largest error being slightly less than 8%. 
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Table 1: Various parameters calculated using the GPU algorithm and con- 
ventional techniques [66] 



Parameter 


GPU algorithm 


Conventional MC 


T c 


.85 


.82 


7 


1.62 


1.75 




0.08 


0.08 



Previous simulations performed on GPUs have noted error averaging 5% in 
the calculation of dipole energy due to the low level of numerical precision 
available on GPUs [52J. In light of this reduced precision the additional 
error introduced by multiple spin flips is acceptable. 
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